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EFFICIENT MOTION VECTOR CODING FOR VIDEO 
COMPRESSION 

FIELD OF THE INVENTION 
5 The invention relates to video coding, and specifically, to an improved method 

for coding motion vectors. 

BACKGROUND OF THE INVENTION 
Full-motion video displays based upon analog video signals have long been 

10 available in the form of television. With recent advances in computer processing 

capabilities and affordability, full-motion video displays based upon digital video signals 
are becoming more widely available. Digital video systems can provide significant 
improvements over conventional analog video systems in creating, modifying, 
transmitting, storing, and playing full-motion video sequences. 

15 Digital video displays include large numbers of image frames that are played or 

rendered successively at frequencies of between 30 and 75 Hz. Each image frame is a 
still image formed from an array of pixels based on the display resolution of a particular 
system. As examples, VHS-based systems have display resolutions of 320x480 
pixels, NTSC-based systems have display resolutions of 720x486 pixels, and high- 

20 definition television (HDTV) systems under development have display resolutions of 
1360x1024 pixels. 

The amounts of raw digital information included in video sequences are 
massive. Storage and transmission of these amounts of video information is infeasible 
with conventional personal computer equipment. Consider, for example, a digitized 

25 form of a relatively low resolution VHS image format having a 320x480 pixel 

resolution. A full-length motion picture of two hours in duration at this resolution 
corresponds to 100 gigabytes of digital video information. By comparison, 
conventional compact optical disks have capacities of about 0.6 gigabytes, magnetic 
hard disks have capacities of 1-2 gigabytes, and compact optical disks under 

30 development have capacities of up to 8 gigabytes. 

To address the limitations in storing or transmitting such massive amounts of 
digital video information, various video compression standards or processes have been 
established, including MPEG-1, MPEG-2, and H.26X. These video compression 
techniques utilize similarities between successive image frames, referred to as 

35 temporal or interframe correlation, to provide interframe compression in which motion 
data and error signals are used to encode changes between frames. 

In addition, the conventional video compression techniques utilize similarities 
within image frames, referred to as spatial or intraframe correlation, to provide 
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intraframe compression in which the image samples within an image frame are 
compressed. Intraframe compression is based upon conventional processes for 
compressing still images, such as discrete cosine transform (DCT) encoding. This type 
of coding is sometimes referred to as "texture" or "transform" coding. A "texture" 
generally refers to a two-dimensional array of image sample values, such as an array of 
chrominance and luminance values or an array of alpha (opacity) values. The term 
"transform" in this context refers to how the image samples are transformed into 
spatial frequency components during the coding process. This use of the term 
"transform" should be distinguished from a geometric transform used to estimate 
scene changes in some interframe compression methods. 

Interframe compression typically utilizes motion estimation and compensation 
to encode scene changes between frames. Motion estimation is a process for 
estimating the motion of image samples (e.g., pixels) between frames. Using motion 
estimation, the encoder attempts to match blocks of pixels in one frame with 
corresponding pixels in another frame. After the most similar block is found in a given 
search area, the change in position of the pixel locations of the corresponding pixels is 
approximated and represented as motion data, such as a motion vector. Motion 
compensation is a process for determining a predicted image and computing the error 
between the predicted image and the original image. Using motion compensation, the 
encoder applies the motion data to an image and computes a predicted image. The 
difference between the predicted image and the input image is called the error signal. 
Since the error signal is just an array of values representing the difference between 
image sample values, it can be compressed using the same texture coding method as 
used for intraframe coding of image samples. 

Although differing in specific implementations, the MPEG-1, MPEG-2, and 
H.26X video compression standards are similar in a number of respects. The following 
description of the MPEG-2 video compression standard is generally applicable to the 
others. 

MPEG-2 provides interframe compression and intraframe compression based 
upon square blocks or arrays of pixels in video images. A video image is divided into 
image sample blocks called macroblocks having dimensions of 16 x 16 pixels. In 
MPEG-2, a macroblock comprises four luminance blocks (each block is 8 x 8 samples 
of luminance (Y)) and two chrominance blocks (one 8x8 sample block each for Cb 
and Cr). 

In MPEG-2, interframe coding is performed on macroblocks. An MPEG-2 
encoder performs motion estimation and compensation to compute motion vectors and 
block error signals. For each block M N in an image frame N, a search is performed 
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across the image of a next successive video frame N+1 or immediately preceding 
image frame N-1 (i.e., bi-directionally) to identify the most similar respective blocks 
M N+1 or M N . r The location of the most similar block relative to the block M N is 
encoded with a motion vector (DX,DY). The motion vector is then used to compute a 
block of predicted sample values. These predicted sample values are compared with 
block M N to determine the block error signal. The error signal is compressed using a 
texture coding method such as discrete cosine transform (DCT) encoding. 

Object-based video coding techniques have been proposed as an improvement 
to the conventional frame-based coding standards. In object-based coding, arbitrary 
shaped image features are separated from the frames in the video sequence using a 
method called "segmentation." The video objects or "segments" are coded 
independently. Object-based coding can improve the compression rate because it 
increases the interframe correlation between video objects in successive frames. It is 
also advantageous for variety of applications that require access to and tracking of 
objects in a video sequence. 

In the object-based video coding methods proposed for the MPEG-4 standard, 
the shape, motion and texture of video objects are coded independently. The shape of 
an object is represented by a binary or alpha mask that defines the boundary of the 
arbitrary shaped object in a video frame. The motion of an object is similar to the 
motion data of MPEG-2, except that it applies to an arbitrary-shaped image of the 
object that has been segmented from a rectangular frame. Motion estimation and 
compensation is performed on blocks of a "video object plane" rather than the entire 
frame. The video object plane is the name for the shaped image of an object in a 
single frame. 

The texture of a video object is the image sample information in a video object 
plane that falls within the object's shape. Texture coding of an object's image 
samples and error signals is performed using similar texture coding methods as in 
frame-based coding. For example, a segmented image can be fitted into a bounding 
rectangle formed of macroblocks. The rectangular image formed by the bounding 
rectangle can be compressed just like a rectangular frame, except that transparent 
macroblocks need not be coded. Partially transparent blocks are coded after filling in 
the portions of the block that fall outside the object's shape boundary with sample 
values in a technique called "padding." 

In both frame-based and object-based video coding, the encoded bit stream 
typically includes many interframe-coded frames (P frames). Each of these P frames 
includes at least one motion vector per macroblock, and each motion vector includes X 
and Y components that coded independently. As such, motion vectors contribute a 
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significant amount of data for each coded P frame. There is a need, therefore, for 
more efficient motion vector coding schemes. 

SUMMARY OF THE INVENTION 
5 The invention provides an improved method of coding motion vectors for video 

coding applications. One aspect of the invention is a method for jointly coding a 
motion vector with a single entropy code. This method is based on the discovery that 
the probability of the X and Y components of the motion vector are not totally 
independent. To exploit the correlation between the motion vector components, the 
10 method uses entropy coding to assign a single variable length code to a joint 

parameter representing the combined X and Y components of the motion vector. 
Motion vector component pairs that are more likely are assigned a shorter length code, 
while less likely component pairs are assigned a longer length code or are coded with 
an escape code followed by a code for each component. This approach can be used 
15 in a variety of video coding applications, including both object-based and frame based 
coding. In addition, joint entropy coding of motion vectors can be used in combination 
with spatial prediction to code motion vectors more efficiently. 

For example, in one implementation, an encoder first computes a predictor for 
the motion vector, and then computes differential X and Y components from the X and 
20 Y components of the vector currently being processed and its predictor. A joint 
entropy coder then computes a single variable length code for a joint parameter 
representing both the X and Y differential components. 

The decoder performs the inverse of the encoder operations to reconstruct the 
motion vector from the variable length code. In particular, it computes the joint 
25 parameter from the variable length code, and then reconstructs the motion vector from 
the differential components and the components of the predictor. 

Additional features of the invention will become more apparent from the 
following detailed description and accompany drawings. 

30 BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a block diagram of a video coder. 
Fig. 2 is a block diagram of a video decoder. 

Fig. 3 is a block diagram illustrating how an implementation of the invention 
jointly codes motion vector components for a macroblock with a single entropy code. 
35 Fig. 4 is a diagram illustrating how a predictor for the motion vector of a 

current block is selected from motion vectors of neighboring macroblocks. 
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Fig. 5 is a diagram illustrating how a motion vector predictor is selected in 
cases where one or more neighboring macroblocks are outside the picture. 

Fig. 6 is a block diagram illustrating how an implementation of the invention 
decodes a jointly coded motion vector. 

Fig. 7 is a diagram of a computer system that serves as an operating 
environment for a software implementation of the invention. 

DETAILED DESCRIPTION 

Introduction 

The first section below provides a description of a video encoder and decoder. 
Subsequent sections describe how to improve coding of motion vectors by exploiting 
the correlation between the X and Y components of the vectors. 

This approach for jointly coding the X and Y components of a motion vector 
applies to both frame-based and object-based video coding. Both forms of video 
coding employ motion vectors to define the motion of a pixel or block of pixels from 
one frame to another. Typically, a motion vector is computed for regular sized blocks 
of pixels. In frame-based coding, the frame is divided into regular sized blocks. In 
object-based coding, each video object plane is divided into blocks. Since the object 
represented in a video object plane usually has a non-rectangular shape, object-based 
coders use the shape to determine which pixels in each block fall within the 
boundaries of the object. While frame-based and object-based coding differ in this 
respect, both approaches use motion vectors that define the motion of pixels in a 
block. Thus, the correlation between the X and Y components of motion vectors in 
both types of coders can be exploited to improve coding efficiency. 

While the encoder and decoder described in the next section are object-based, 
they provide a sufficient basis for explaining how to implement the invention in both 
frame-based and object-based coding schemes. 



Description of an Example Encoder and Decoder 

Fig. 1 is a block diagram illustrating an implementation of an object-based 
video encoder. The input 30 to the encoder includes images representing the video 
objects in each frame, the shape of each video object and bounding rectangles. The 
shape information is available before the encoder codes texture or motion data. 
Frame-based coding differs in that the entire frame is coded without shape 
information, and the input 30 consists of a series of image frames. 

The shape coding module 32 reads the definition of an object including its 
bounding rectangle and extends the bounding rectangle to integer multiples of 
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macroblocks. The shape information for an object comprises a mask or "alpha plane." 
The shape coding module 32 reads this mask and compresses it, using for example, a 
conventional chain coding method to encode the contour of the object. 

Motion estimation module 34 reads an object including its bounding rectangle 
5 and a previously reconstructed image 36 and computes motion estimation data used to 
predict the motion of an object from one frame to another. The motion estimation 
module 34 searches for the most similar macroblock in the reconstructed image for 
each macroblock in the current image to compute a motion vector for each 
macroblock. The specific format of the motion vector from the motion estimation 

10 module 34 can vary depending on the motion estimation method used. In the 

implementation described below, there is a motion vector for each macroblock, which 
is consistent with current MPEG and H26X formats. 

The motion compensation module 38 reads the motion vectors computed by 
the motion estimation module and the previously reconstructed image 36 and 

15 computes a predicted image for the current frame. Each pixel in the predicted image is 
constructed by using the motion vector for the macroblock that it resides in to find the 
corresponding pixel in the previously reconstructed image 36. The encoder then finds 
the difference between the image sample values in the input image block as specified 
in the input 30 and the corresponding sample values in the predicted image block as 

20 computed in the motion compensation module 38 to determine the error signal for the 
macroblock. 

Texture coding module 40 compresses this error signal for inter-frame coded 
objects and compresses image sample values for the object from the input data stream 
30 for intra-frame coded objects. The feedback path 42 from the texture coding 

25 module 40 represents the error signal. The encoder uses the error signal blocks along 
with the predicted image blocks from the motion compensation module to compute the 
previously reconstructed image 36. 

The texture coding module 40 codes intra-frame and error signal data for an 
object using any of a variety of still image compression techniques. Example 

30 compression techniques include DCT, wavelet, as well as other conventional image 
compression methods. 

The bit stream of the compressed video sequence includes the shape, motion 
and texture coded information from the shape coding, motion estimation, and texture 
coding modules. Multiplexer 44 combines and formats this data into the proper syntax 

35 and outputs it to the buffer 46. As explained in more detail below, the encoder also 
includes a motion vector encoder that uses entropy coding to jointly code the x and y 
components of the motion vector for each macroblock. The motion vector encoder 
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may be implemented as part of the motion estimation module 34 or as part of the data 
formatting functions in the multiplexer 44. 

While the encoder can be implemented in hardware or software, it is most 
likely implemented in software. In a software implementation, the modules in the 
encoder represent software instructions stored in memory of a computer and executed 
in the processor, and the video data stored in memory. A software encoder can be 
stored and distributed on a variety of conventional computer readable media. In 
hardware implementations, the encoder modules are implemented in digital logic, 
preferably in an integrated circuit. Some of the encoder functions can be optimized in 
special-purpose digital logic devices in a computer peripheral to off-load the processing 
burden from a host computer. 

Fig. 2 is a block diagram illustrating a decoder for an object-based video coding 
method. A demultiplexer 60 receives a bit stream representing a compressed video 
sequence and separates shapes, motion and texture encoded data on an object by 
object basis. The demultiplexer also includes a motion vector decoder that 
reconstructs the motion vector for each macroblock from a single variable length code. 

Shape decoding module 64 decodes the shape or contour for the current object 
being processed. To accomplish this, it employs a shape decoder that implements the 
inverse of the shape encoding method used in the encoder of Fig. 1 . The resulting 
shape data is a mask, such as a binary alpha plane or gray scale alpha plane 
representing the shape of the object. 

The motion decoding module 66 decodes the motion information in the bit 
stream. The decoded motion information includes the motion vectors for each 
macroblock that are reconstructed from entropy codes in the incoming bitstream. The 
motion decoding module 66 provides this motion information to the motion 
compensation module 68, and the motion compensation module 68 uses the motion 
vectors to find predicted image samples in the previously reconstructed object data 70. 

The texture decoding module 74 decodes error signals for inter-frame coded 
texture data and an array of color values for intra-frame texture data and passes this 
information to a module 72 for computing and accumulating the reconstructed image. 
For inter-frame coded objects, this module 72 applies the error signal data to the 
predicted image output from the motion compensation module to compute the 
reconstructed object for the current frame. For intra-frame coded objects the texture 
decoding module 74 decodes the image sample values for the object and places the 
reconstructed object in the reconstructed object module 72. Previously reconstructed 
objects are temporarily stored in object memory 70 and are used to construct the 
object for other frames. 
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Like the encoder, the decoder can be implemented in hardware, software or a 
combination of both. In software implementations, the modules in the decoder are 
software instructions stored in memory of a computer and executed by the processor, 
and video data stored in memory. A software decoder can be stored and distributed 
on a variety of conventional computer readable media. In hardware implementations, 
the decoder modules are implemented in digital logic, preferably in an integrated 
circuit. Some of the decoder functions can be optimized in special-purpose digital 
logic devices in a computer peripheral to off-load the processing burden from a host 
computer. 

Improved Coding of Motion Vectors 

The coding efficiency of motion vectors can be improved by exploiting the 
correlation between the X and Y components of a motion vector. Traditional coding 
methods code the X and Y components separately based on the premise that the 
probability distribution of the X and Y components are independent. We have 
discovered that the X and Y components are not totally independent, but instead, have 
a correlation. 

To take advantage of this correlation, an implementation of the invention 
assigns a single entropy code to the joint X and Y components of a motion vector. 
Before coding, sample video data for a target bit rate and content scenario is used to 
generate a codebook. This codebook assigns a single variable length code to pairs of 
X and Y components based on their frequency of occurrence. More frequent, and 
therefore statistically more probable pairs, are assigned shorter length codes, while 
less frequent pairs are assigned longer length codes. A statistical analysis program 
computes the probability of each of the joint X and Y components by extracting the 
motion vector data generated from an encoder for several example video sequences 
that have the desired type of content. The program creates a probability distribution, 
for pairs of motion vectors (namely, differential motion vectors) and then assigns 
codes to a subset of the motion vectors that are most probable. 

To limit the size of the codebook, low probability pairs need not be assigned a 
code. Instead, these pairs can be coded by using an escape code to indicate that the 
motion vector components follow in fix length bit fields. Pairs are excluded from the 
codebook based on where they fall in the probability distribution. 

While not required, the coding of motion vectors can be improved by using a 
differential coding process that takes advantage of the spatial dependency of motion 
vectors. In particular, a motion vector for a small block of pixels is likely to point in a 
similar direction as the motion vector for a neighboring block, especially if both the 



WO 00/33581 



PCT/US99/28395 



current block and its neighbor are in a region of the frame having nearly uniform 
motion. One way to take advantage of this spatial dependency is to code the 
difference between a motion vector for the current block and the motion vector for a 
neighboring block, called the predictor. The implementation uses a form of spatial 
5 prediction to encode the X and Y components before assigning a joint entropy code. 

Figure 3 is a block diagram illustrating how our implementation encodes motion 
vectors. The features shown in Fig. 3 are implemented in the encoder and operate on 
the motion vectors computed in the motion estimation block 34. First, the motion 
estimation block computes a motion vector for each macroblock in the frame. When a 

10 frame consists of more than one video object plane, the motion estimation block 
computes motion vectors for the macroblocks of each video object plane. 

The encoder begins coding the motion vector for each macroblock by 
computing a predictor for the current motion vector. The implementation shown in 
Fig. 3 selects a predictor from among neighboring macroblocks. Figure 4 shows an 

15 example of the positioning of the candidates for the predictor relative to the current 
macroblock for which the motion vector is being encoded. In this example, the 
candidate macroblocks include the ones to the left 400, above 402, and above-right 
404 relative to the current macroblock 406. The motion vectors for the candidate 
macroblocks are referred to as MV1, MV2, and MV3, respectively. 

20 As shown in Fig. 3, the encoder computes the predictor separately for the X 

and Y components of the current macroblock. In particular, the motion vector 
predictors 300, 302 compute the median of the X and Y components for the candidate 
macroblocks. The median of these three values is chosen as the predictor for the X 
and Y components. The precise method of computing the predictor is not critical to 

25 the invention and other ways of selecting a predictor are possible. One alternative is 
to select a neighboring block located in the direction of the lowest gradient of the 
neighboring motion vectors. Another alternative is to compute an average of motion 
vectors of neighboring blocks. 

Once the motion vector predictor selects the predictor, the encoder computes 

30 differential motion vector components. For each X and Y component, the encoder 

computes the difference between the component of the current motion vector and the 
corresponding component of the predictor. As reflected by subtractor units 304, 306 
in Fig. 3, the X component of the predictor is subtracted from the X component of the 
current vector MVx, and the Y component of the predictor is subtracted from the Y 

35 component of the current vector MVy. 

The resulting differential X and Y components (MVDx and MVDy) are then 
formed into a joint parameter that is coded with a single variable length code, or an 
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escape code followed by fixed code word for each differential component. The 
implementation uses a joint Huffman coding table that is trained for a target bit rate 
and video content. The joint entropy coder 308 looks up the joint parameter in the 
table to find a corresponding variable length code. If the coder finds a match in the 
table, it codes the joint parameter with a single variable length code. Otherwise, it 
codes an escape code followed by a fixed length code word for each component. 

The entropy codes 310 shown in Fig. 3 refer to the Huffman coding table. An 
example of a Huffman coding table trained for low bit rate, talking head applications is 
set forth at the end of this section in Table 1 . Following Table 1 , Table 2 is an 
example of a Huffman table trained for more general video applications. While our 
implementation uses Huffman coding tables, the entropy codes can be computed using 
other forms of entropy coding such as arithmetic coding. 

Since the predictor is selected from motion vectors of neighboring blocks of 
pixels, the encoder applies special rules to address the situation where one or more 
neighboring blocks are outside the picture. Figure 5 illustrates cases where a 
neighboring block is outside the picture and shows the motion vectors that are used to 
predict the motion vector in the current macroblock. 

If one neighboring block is outside the picture (e.g., block 500 in Fig. 5), a zero 
motion vector (0,0) is used in its place. The predictor of the current macroblock 506 
is computed as the median of the zero motion vector, and motion vectors MV2 and 
MV3 for the other two neighboring macroblocks 502, 504. As another example, the 
configuration on the far right of Fig. 5 shows the case where the above-right 
macroblock 524 is out of the picture, in this case, MV1 and MV2 for the other two 
macroblocks 520, 522 inside the picture are used along with the zero motion vector 
for the third macroblock 524 to predict the motion vector for the current macroblock 
526. 

If two candidate macroblocks 51 2, 514 are out of the picture (as shown in the 
middle diagram of Fig. 5), then the motion vector for the third neighboring macroblock 
510 is selected as the predictor for the current macroblock 516. 

Figure 6 is a diagram illustrating an implementation of a decoder for decoding a 
single variable length code representing joint motion vector components into X and Y 
motion vector components. The joint entropy decoder 600 reads the variable length 
code as input and finds the corresponding differential X and Y components in the 
entropy codes 602. In the current implementation, the entry codes are in the form of 
a Huffman table (e.g., tables 1 or 2 listed below). As noted above, the encoder can 
also use an alternative entropy coding scheme, in which case, the decoder would have 
the appropriate codebook to correspond with the codebook used in the encoder. 
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In some cases, the motion vector may be coded with an escape code followed 
by two fixed length codes representing the differential motion vector components. In 
this case, the joint entropy decoder 600 recognizes the escape code and interprets the 
following data as differential motion vectors instead of a variable length code. It then 
5 passes the differential X and Y components to the next stage. 

Next, the decoder forms the motion vector from the differential motion vector 
components MVDx, MVDy and the X and Y components of the predictor. In particular 
the decoder adds each differential motion vector component MVDx, MVDy and the X 
and Y components of the predictor (see adders 604, 606, Fig. 6). The decoder 

10 computes the predictor components in the same way as the encoder. In particular, it 
has a motion vector predictor that computes the predictor of the motion vectors 
previously decoded for the three neighboring macroblocks (MVxl, MVy), (MVx2, 
MVy2> and (MVx3, MVy3). In the implementation, the motion vector predictor blocks 
608, and 610 represent the computation of the median of the X and Y components, 

1 5 respectively, of the neighboring macroblocks. As noted above, other ways of 

computing the predictor are possible. Regardless of the specific form of prediction, 
the decoder performs inverse prediction according to the prediction scheme used in the 
encoder. 

Once the motion vector for the current macroblock (MVx, MVy) is 
20 reconstructed, it is stored and used to decode the motion vector for neighboring 
macroblocks according to the prediction scheme. 

The following tables provide examples of Huffman coding tables trained for 
talking head video (Table 1) and more general video content (Table 2). 



Table 1 : XY Joint VLC Motion Vector Table for Talking Head Video 
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-0.0 
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o 


1 6 
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Table 2: XY Joint VLC Motion Vector Table for General Video 



Index 


Mv x 


Mv y 


Number of bits 


Code 


0 


0 


0 


1 


0 


1 


-0.5 


0 


5 


10011 


2 


0 


-0.5 


5 


10101 


3 


0.5 


0 


5 


11001 


4 


-0.5 


-0.5 


5 


11011 


5 


0 


0.5 


6 


100100 


6 


0.5 


-0.5 
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7 


0.5 


0.5 
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8 


-0.5 
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9 


1 


0 


7 
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10 


-1 


0 


7 
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11 


0 


-1 
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12 


0 
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13 
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-0.5 


8 
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14 


-1 


-0.5 
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11000111 


15 


1.5 
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16 
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11101010 
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18 
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21 
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Brief Overview of a Computer System 

Figure 7 and the following discussion are intended to provide a brief, general 
description of a suitable computing environment in which the invention may be 
5 implemented. Although the invention or aspects of it may be implemented in a 
hardware device, the encoder and decoder described above are implemented in 
computer-executable instructions organized in program modules. The program 
modules include the routines, programs, objects, components, and data structures that 
perform the tasks and implement the data types described above. 

10 While Fig. 7 shows a typical configuration of a desktop computer, the 

invention may be implemented in other computer system configurations, including 
hand-held devices, multiprocessor systems, microprocessor-based or programmable 
consumer electronics, minicomputers, mainframe computers, and the like. The 
invention may also be used in distributed computing environments where tasks are 

15 performed by remote processing devices that are linked through a communications 

network. In a distributed computing environment, program modules may be located in 
both local and remote memory storage devices. 

Figure 7 illustrates an example of a computer system that serves as an 
operating environment for the invention. The computer system includes a personal 

20 computer 720, including a processing unit 721, a system memory 722, and a system 
bus 723 that interconnects various system components including the system memory 
to the processing unit 721 . The system bus may comprise any of several types of bus 
structures including a memory bus or memory controller, a peripheral bus, and a local 
bus using a bus architecture such as PCI, VESA, MicroChannel (MCA), ISA and EISA, 

25 to name a few. The system memory includes read only memory (ROM) 724 and 
random access memory (RAM) 725. A basic input/output system 726 (BIOS), 
containing the basic routines that help to transfer information between elements within 
the personal computer 720, such as during start-up, is stored in ROM 724. The 
personal computer 720 further includes a hard disk drive 727, a magnetic disk drive 

30 728, e.g., to read from or write to a removable disk 729, and an optical disk drive 
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730, e.g., for reading a CD-ROM disk 731 or to read from or write to other optical 
media. The hard disk drive 727, magnetic disk drive 728, and optical disk drive 730 
are connected to the system bus 723 by a hard disk drive interface 732, a magnetic 
disk drive interface 733, and an optical drive interface 734, respectively. The drives 
5 and their associated computer-readable media provide nonvolatile storage of data, data 
structures, computer-executable instructions (program code such as dynamic link 
libraries, and executable files), etc. for the personal computer 720. Although the 
description of computer-readable media above refers to a hard disk, a removable 
magnetic disk and a CD, it can also include other types of media that are readable by a 

10 computer, such as magnetic cassettes, flash memory cards, digital video disks, 
Bernoulli cartridges, and the like. 

A number of program modules may be stored in the drives and RAM 725, 
including an operating system 735, one or more application programs 736, other 
program modules 737, and program data 738. A user may enter commands and 

15 information into the personal computer 720 through a keyboard 740 and pointing 
device, such as a mouse 742. Other input devices (not shown) may include a . 
microphone, joystick, game pad, satellite dish, scanner, or the like. These and other 
input devices are often connected to the processing unit 721 through a serial port 
interface 746 that is coupled to the system bus, but may be connected by other 

20 interfaces, such as a parallel port, game port or a universal serial bus (USB). A monitor 
747 or other type of display device is also connected to the system bus 723 via an 
interface, such as a display controller or video adapter 748. In addition to the monitor, 
personal computers typically include other peripheral output devices (not shown), such 
as speakers and printers. 

25 The personal computer 720 may operate in a networked environment using 

logical connections to one or more remote computers, such as a remote computer 
749. The remote computer 749 may be a server, a router, a peer device or other 
common network node, and typically includes many or all of the elements described 
relative to the personal computer 720, although only a memory storage device 750 

30 has been illustrated in Figure 7. The logical connections depicted in Figure 7 include a 
local area network (LAN) 751 and a wide area network (WAN) 752. Such networking 
environments are commonplace in offices, enterprise-wide computer networks, 
intranets and the Internet. 

When used in a LAN networking environment, the personal computer 720 is 

35 connected to the local network 751 through a network interface or adapter 753. 

When used in a WAN networking environment, the personal computer 720 typically 
includes a modem 754 or other means for establishing communications over the wide 
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area network 752, such as the Internet. The modem 754, which may be internal or 
external, is connected to the system bus 723 via the serial port interface 746. In a 
networked environment, program modules depicted relative to the personal computer 
720, or portions thereof, may be stored in the remote memory storage device. The 
5 network connections shown are merely examples and other means of establishing a 
communications link between the computers may be used. 

Conclusion 

While the invention has been illustrated using a specific implementation as an 
10 example, the scope of the invention is not limited to the specific implementation 
described above. Spatial prediction effectively exploits the spatial dependency of 
motion vectors and improves the efficiency of jointly coding motion vectors with a 
single entropy code. However, the specific form of prediction used on the motion 
vectors is not critical to the invention. In fact, it is possible to implement the invention 
15 without using a prediction scheme. 

The implementation described above specifically uses a Huffman coding 
scheme to compute entropy codes for a joint motion vector parameter. As noted, it is 
also possible to use other forms of entropy coding to encode the joint parameter with 
a single entropy code. 

20 In view of the many possible implementations of the invention, it should be 

recognized that the implementation described above is only examples of the invention 
and should not be taken as a limitation on the scope of the invention. Rather, the 
scope of the invention is defined by the following claims. We therefore claim as our 
invention all that comes within the scope and spirit of these claims. 
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We claim: 

1 . In a video coder for coding video images in a block format, a method for 
improving compression of the video images comprising: 

5 predicting x and y motion vector components for a current block of pixels 

based on a motion vector of at least one neighboring block of pixels to compute x and 
y components of a predictor motion vector; 

computing differential x and y components from the x and y components of 
the predictor and x and y components of a motion vector for the current block; and 
10 assigning a single variable length code to joint x and y differential motion 

vector components, such that shorter variable length codes are assigned to joint 
differential motion vector components that have a higher probability of occurrence in 
the video images, and longer variable length codes are assigned to joint differential 
motion vector components that have a lower probability of occurrence. 

15 

2. The method of claim 1 wherein the variable length codes are assigned from 
a variable length code table comprising a list of pairs of joint differentia! motion vector 
components and a corresponding variable length code for each pair of joint differential 
motion vector components. 

20 

3. The method of claim 2 wherein the assigning step includes: 
looking up the joint differential motion vector components in the table; 

when no match is found in the table, coding an escape code along with a fixed 
length code for each differential motion vector component. 

25 

4. The method of claim 1 wherein the block of pixels corresponds to a 
macroblock in a video frame divided into fixed-sized, rectangular macroblocks, and the 
predicting computing, and assigning steps are repeated for the macroblocks in the 
video frame. 

30 

5. The method of claim 1 wherein the block of pixels corresponds to a 
macroblock of a video object plane in video frame having two more video object 
planes, and the video object planes are each divided into fixed-sized, rectangular 
macroblocks; and 

35 the predicting, computing and assigning steps are repeated for the macroblocks 

in the video object planes, 
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6. A computer readable medium having instructions for performing the steps 
of claim 1. 

7. In a video decoder, a method for decoding macroblocks of a predicted 
5 video frame comprising: 

receiving a single variable length code representing joint x and y components 
of a motion vector for each of the macroblocks; 

for each of the macroblocks, searching for a single entry in an entropy 
codebook corresponding to the variable length code and including the x and y 
10 components of the motion vector; and 

using the x and y components of the motion vector from the codebook to 
define motion of pixels in a corresponding macroblock. 

8. The method of claim 7 wherein the x and y components of the motion 
15 vector in the codebook comprise x and y differential motion vector components, and 

the method comprises: 

reconstructing the motion vector from the differential motion vector 
components and x and y components of a predictor motion vector. 

20 9. The method of claim 7 wherein the codebook is a Huffman table trained for 

a target bit rate and content type from a statistical analysis of example video 
sequences having the content type. 

10. A computer readable medium having instructions for performing the steps 
25 of claim 7. 

11. A motion vector encoder comprising: 

a motion vector predictor for computing a motion vector predictor for a motion 
vector of a block of pixels from at least one motion vector for a neighboring block of 
30 pixels; 

a subtractor for computing differential motion vector components from motion 
vector components of the predictor and the motion vector of the block of pixels; and 

a joint entropy coder for jointly coding the differential motion vector 
components with a single variable length code. 

35 

12. The encoder of claim 1 1 wherein the joint entropy coder computes the 
single variable length code by searching for the code in a Huffman coding table 
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comprising a list of joint differential motion vectors and a corresponding variable length 
code for each of the joint differential motion vectors. 

1 3. A motion vector decoder comprising: 
5 a motion vector predictor for computing a motion vector predictor for a motion 

vector of a block of pixels from at least one motion vector for a neighboring block of 
pixels; 

a joint entropy decoder for decoding a single variable length code into joint 
differential motion vector components; and 
10 an adder for reconstructing X and Y motion vector components from the joint 

differential motion vector components and X and Y components of the motion vector 
predictor. 

14. The decoder of claim 13 wherein the joint entropy decoder decodes the 
15 single variable length code by searching for the code in a Huffman coding table 

comprising a list of variable length codes and corresponding joint differential motion 
vector components for each the variable length codes. 

15. The decoder of claim 13 wherein the joint entropy decoder is operable to 
20 detect an escape code indicating that two fixed length codes representing X and Y 

differential motion vector components follow the escape code. 

16. In a video coder for coding video images in a block format, a method for 
improving compression of the video images comprising: 

25 computing x and y motion vector components for a block; 

forming the x and y motion vector components into a joint parameter 
representing joint x and y motion vector components; and 

assigning a single variable length code to the joint x and y motion vector 
components, such that shorter variable length codes are assigned to joint motion 
30 vector components that have a higher probability of occurrence in the video images, 
and longer variable length codes are assigned to joint differential motion vector 
components that have a lower probability of occurrence. 



35 



1 7. The method of claim 16 further including spatially predicting the x and y 
motion vector components from a neighboring block of the block; and using spatially 
predicted components as the joint x and y motion vector components. 
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18. The method of claim 17 wherein the spatially predicted components are 
differential motion vector components computed as a difference between x and y 
components of the motion vector for the block and x and y components of a predictor 
motion vector. 

19. In a video decoder, a method for decoding macroblocks of a predicted 
video frame comprising: 

receiving a single variable length code representing joint differential x and y 
components of a motion vector for each of the macroblocks; 

for each of the macroblocks, searching for a single entry in a Huffman table 
corresponding to the variable length code and including the joint differential x and y 
components of the motion vector; 

computing x and y components of a predictor motion vector from neighboring 
macroblocks to the macroblock currently being decoded; and 

reconstructing the motion vector from the differential components obtained 
from the Huffman table and the x and y components of the predictor motion vector. 
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